AITopics | euclidean k-means

Clustering Stable Instances of Euclidean k-means.

Neural Information Processing SystemsMar-17-2026, 17:48:29 GMT

The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances of Euclidean k-means clustering. Stable instances have unique optimal k-means solutions that does not change even when each point is perturbed a little (in Euclidean distance). This captures the property that k-means optimal solution should be tolerant to measurement errors and uncertainty in the points. We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable. When the instance has some additional separation, we can design a simple, efficient algorithm with provable guarantees that is also robust to outliers. We also complement these results by studying the amount of stability in real datasets, and demonstrating that our algorithm performs well on these benchmark datasets.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.60)

Add feedback

Clustering Stable Instances of Euclidean k-means.

Neural Information Processing SystemsNov-21-2025, 16:06:54 GMT

The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances of Euclidean k-means clustering. Stable instances have unique optimal k-means solutions that does not change even when each point is perturbed a little (in Euclidean distance). This captures the property that k-means optimal solution should be tolerant to measurement errors and uncertainty in the points. We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable. When the instance has some additional separation, we can design a simple, efficient algorithm with provable guarantees that is also robust to outliers. We also complement these results by studying the amount of stability in real datasets, and demonstrating that our algorithm performs well on these benchmark datasets.

design efficient algorithm, euclidean k-means, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.60)

Add feedback

Reviews: Clustering Stable Instances of Euclidean k-means.

Neural Information Processing SystemsOct-8-2024, 10:07:00 GMT

The authors propose a notion of additive perturbation stability (APS) for Euclidean distances that maintain the optimal k-means clustering solution when each point in the data is moved by a sufficiently small Euclidean distance. I think the paper is rather interesting; however, the results of the paper are not very surprising. Here are my comments regarding the paper: (1) To my understanding, the results of Theorem 1.2 are only under the condition of APS. They only hold for the case of k 2 components and may lead to exponential dependence on k components for large k . However, under the additional margin condition between any two pairs of cluster, we will able to guarantee the existence of polynomial algorithm on k .

algorithm, euclidean distance, euclidean k-means, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Clustering Stable Instances of Euclidean k-means.

Aravindan Vijayaraghavan, Abhratanu Dutta, Alex Wang

Neural Information Processing SystemsOct-4-2024, 07:02:19 GMT

The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances of Euclidean k-means clustering. Stable instances have unique optimal k-means solutions that does not change even when each point is perturbed a little (in Euclidean distance). This captures the property that k-means optimal solution should be tolerant to measurement errors and uncertainty in the points. We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable. When the instance has some additional separation, we can design a simple, efficient algorithm with provable guarantees that is also robust to outliers. We also complement these results by studying the amount of stability in real datasets, and demonstrating that our algorithm performs well on these benchmark datasets.

algorithm, perturbation stability, stability, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)

Add feedback

$k$-means on Positive Definite Matrices, and an Application to Clustering in Radar Image Sequences

Fryer, Daniel, Nguyen, Hien, Castellazzi, Pascal

arXiv.org Machine LearningAug-25-2020

However, performing k-means on SPD matrices may correspond bijectively to mean centered Gaussian distributions, be difficult, without a computationally efficient form for the and are used to model Brownian motion in Diffusion Fréchet mean [13]. Tensor Imaging (DTI), where they are referred to as tensors [1]. The finite-lag autocovariance matrices of time-series are In Section II, we introduce the log-Cholesky distance and SPD, and have been used in compression based clustering closed-form expression for the corresponding Fréchet mean.

artificial intelligence, machine learning, matrix, (18 more...)

arXiv.org Machine Learning

2008.03454

Country:

Oceania > Australia > South Australia (0.04)
Oceania > Australia > Queensland (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > Alaska (0.04)

Genre: Research Report (0.40)

Industry: Energy (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Clustering Stable Instances of Euclidean k-means.

Vijayaraghavan, Aravindan, Dutta, Abhratanu, Wang, Alex

Neural Information Processing SystemsFeb-14-2020, 18:57:58 GMT

The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances of Euclidean k-means clustering. Stable instances have unique optimal k-means solutions that does not change even when each point is perturbed a little (in Euclidean distance).

design efficient algorithm, euclidean k-means, stability, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.62)

Add feedback

Clustering Stable Instances of Euclidean k-means.

Vijayaraghavan, Aravindan, Dutta, Abhratanu, Wang, Alex

Neural Information Processing SystemsDec-31-2017

The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances of Euclidean k-means clustering. Stable instances have unique optimal k-means solutions that does not change even when each point is perturbed a little (in Euclidean distance). This captures the property that k-means optimal solution should be tolerant to measurement errors and uncertainty in the points. We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable. When the instance has some additional separation, we can design a simple, efficient algorithm with provable guarantees that is also robust to outliers. We also complement these results by studying the amount of stability in real datasets, and demonstrating that our algorithm performs well on these benchmark datasets.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)

Add feedback